Here are the packages we will be using:
library(statnet)
library(igraph)
library(intergraph)
{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}.
When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable is the organization of edges in the network.
We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.
QAP is a resampling-based method that controls for non-independence
in network structure via random permutations. These permutations use
different arrangements of the rows and columns in the adjacency matrix.
Thus, the network structure is maintained but the arrangement of
individuals in the structure is randomized. This represents the null
hypothesis because it should eliminate any potential correlations
between ties and independent values.
What kinds of questions would you ask with our bonobo data?
There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, multiple/linear regressions, and logistic regressions. We will describe and demonstrate each regression using the bonobo data. We’ll also do the same with a more complex dataset from spider monkeys!
Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask: Do bonobos who groom each other correlate with bonobos who GG rub with one another?
What is our null hypothesis and alternative hypothesis in this question?
We can start by using the gcor() function to get a correlation coefficient between the two networks.
gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054
What does this correlation coefficient signify?
These networks don’t seem very correlated but let’s test for significance anyways. The qaptest() function uses Monte Carlo simulations of likelihood quantiles to test arbitrary graph-level statistics against a QAP null hypothesis.
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list (in this case the GG rubbing network)
g2=2, # use second graph in the list (in this case the groom network)
reps = 1000) # number of permutations to run
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.317
## p(f(perm) <= f(d)): 0.837
##
## Test Diagnostics:
## Test Value (f(d)): 0.1111054
## Replications: 1000
## Distribution Summary:
## Min: -0.4191705
## 1stQ: -0.1275187
## Med: 0.005050247
## Mean: -0.01361547
## 3rdQ: 0.1111054
## Max: 0.8534918
Under “Test Diagnostics”, the test value represents the correlation coefficient between grooming and GG rubbing networks. This is interpreted at the level of the dyad, pairs who groom each other are likely not related to pairs who GG rub with one another. We previously calculated this using the gcor() function.
Under “Estimated p-values”, we find statistics that tell us about how this observed correlation compares to what the correlation looks like when we randomly permute one of our networks. Note that these are both one-tailed p-values! We can see that our p-values are greater than 0.05, meaning we have evidence to fail to reject our null hypothesis. There is no significant correlation between grooming and GG rubbing networks.
We can also visualize the “Distribution Summary” using the plot() function.
plot(qap_cor)
The dotted line gives us the correlation among the observed networks. The curve shows the distribution of correlations across permutated networks. Our plot shows that across 1000 replications, very rarely do we see any correlation between grooming and GG rubbing networks in this bonobo population. The correlation tends to center around 0.
Linear regression QAPs test the extent to which predictor variables are affecting edge weights. Recall that our predictor variables are node attributes.
For example, if our predictor variable is rank, we can ask: Does rank influence the edge weight of GG rubbing? In other words, we are asking: Is a bonobo more likely to have more weighted edges for each unit increase in rank?
Before we dive in, what do you predict? Do you think there is be a difference between actors (senders) and receivers? What does it mean to have more edge weight?
Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).
First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.
nodes <- length(bonobo_attribute$actor) # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in
rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the rank of each actor is repeated over entire ROW of matrix
rank_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.
rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the rank of each receiver is repeated over entire COLUMN of matrix
rank_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our linear regression! First we are asking: Does rank affect how many times bonobos GG rubbed others (the weight of the edges they sent)? Null hypothesis is that rank does not affect how many times bonobods GG rub others (both high and low)
lrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4534205 -0.3093578 -0.2036344 0.5465795 0.8695587
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8507548 0.998 0.002 0.004
## x1 -0.1161796 0.008 0.992 0.027
##
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227 Adjusted R-squared: 0.03883
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.660544 -1.882904
## 1stQ -0.654187 -0.620122
## Median 0.091024 -0.007029
## Mean 0.077896 0.004873
## 3rdQ 0.849541 0.522875
## Max 2.693891 1.874392
To interpret this we can imagine the x axis being rank and the y axis being edge weight. For every unit increase in rank, edge weights are going down 0.1161796 (the estimate of x1 in the above coefficients). This relationship is slightly significant at 0.032 on the two tailed test. However, if we look at our adjusted r^2 value, our model is only explaining ~4% of the variability. And our F-statistic p value is too high to be significant at 0.111. Taken together, we can say that the rank of sender is not significantly explaining the edge weight of GG rubbing interactions.
Now let’s look at the receivers.
Does rank affect how many times bonobos received GG rubbing from others (the weight of the edges they receive)? The null hypothesis is the same as above, that rank of receiver does not affect the tnumber of times they are GG rubbed with.
lrqap2 <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(lrqap2)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3992884 -0.3393159 -0.2434687 0.6007116 0.8021901
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.64715036 0.993 0.007 0.012
## x1 -0.07247427 0.047 0.953 0.103
##
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423 Adjusted R-squared: -0.0001605
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.11668 -1.23372
## 1stQ -0.56714 -0.48077
## Median 0.04401 0.03890
## Mean 0.04428 0.01741
## 3rdQ 0.66126 0.51956
## Max 2.00229 1.34781
Just like with the actors/senders, let’s imagine that rank is on the x axis and edge weight is on the y axis. Again, we see a negative slope, though this one is even less steep at -0.07247427. The two tailed test p value for this variable is not significant. This model is explaining almost 0% of the variability, which means this model is really not a good fit for the residuals. We can imagine the points scattered far away from the line. The F-statistic p-value is too high to be significant at 0.3249, which is even higher than the sender p-value. Rank of receiver does not significantly explain the edge weight of GG rubbing interactions.
So we fail to reject the null hypothesis in this question too. But that begs the question, if rank is not influencing the edge weight then is there some other attribute that is?
We can include multiple predictor variables! We can also use another network as a predictor variable! By doing so, we are conducting a multiple regression QAP. For example, we can ask: Do sender rank and grooming relationships predict the number of times bonobos GG rubbed in the network? The null hypothesis is that neither rank of of the actor nor grooming relationships affect the number of times bonobos initiate GG rubbing (two tailed test).
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending, bonobo_groom_net)) # list of all predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.5370306 -0.3281637 -0.1926087 0.5464810 0.8485420
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8256023 0.996 0.004 0.008
## x1 -0.1210313 0.007 0.993 0.021
## x2 0.1253554 0.790 0.210 0.484
##
## Residual standard error: 0.4603 on 39 degrees of freedom
## Multiple R-squared: 0.07951 Adjusted R-squared: 0.0323
## F-statistic: 1.684 on 2 and 39 degrees of freedom, p-value: 0.1988
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1 x2
## Min -2.555843 -2.073998 -2.949316
## 1stQ -0.610343 -0.574280 -0.932728
## Median 0.104895 -0.017388 -0.143389
## Mean 0.085593 0.005107 0.095635
## 3rdQ 0.830378 0.507682 0.725760
## Max 2.608258 1.985326 10.532012
Are any of the above relationships significant when taken into the model together? Is this model a better fit than the others? How can we tell?
Logistic regression QAPs test the extent to which independent
variables are affecting the presence or absence of edges, but not the
weight of those edges. If you have a network with binary ties, we can
use this regression method to predict ties in the outcome network
(response variable)
Using our bonobo dataset, we can ask:
Does older age make an individual more or less likely to G-G rub
with other individuals?
In this example, our GG rubbing
network is the response variable while age is our predictor
variable.
If you imagine age (as the predictor variable) on the x axis, for every year a bonobo gets older, are they more likely to GG rub with more individuals? Will an older individual have more ties on the social network?
Similar to the multiple regression, we must first create a matrix that shows the rank of senders and receivers. As a reminder, senders are represented by ROWS while receivers are represented by COLUMNS in matrices
nodes <- length(bonobo_attribute$actor) # number of nodes in the data set
age <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we're interested in
age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create empty matrix to be filled
age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
age_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
for (i in 1:nodes){
age_receiving[,i] <- rep(age[i], nodes)}
age_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our logistic regression!
What is our null hypothesis for this question?
The netlogit() function performs a logistic regression of the network variable for the response variable (bonobo_ggr_net) on the network variables in the predictor variables (age_receiving and age_sending). The resulting fits and coefficients are tested against the null hypothesis.
logqap <- netlogit(bonobo_ggr_net, # response variable is a network object with an unweighted adjacency matrix
list(age_receiving, age_sending), # list of all predictor variables as network or matrix objects
reps = 1000) # number of draws for quantile estimation, 1000 reps is the default
summary(logqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.5524763 94.8670398 0.973 0.027 0.054
## x1 -0.4903676 0.6124013 0.033 0.967 0.065
## x2 -0.6817886 0.5057117 0.018 0.982 0.024
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
## 10.77041 on 3 degrees of freedom, p-value 0.01303442
## AIC: 53.45396 BIC: 58.66696
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2040994
## (Dn-Dr)/Dn: 0.1849811
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 11
## 1 4 2
##
## Total Fraction Correct: 0.6428571
## Fraction Predicted 1s Correct: 0.3333333
## Fraction Predicted 0s Correct: 0.6944444
## False Negative Rate: 0.8461538
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -2.04329 -1.57848 -2.01970
## 1stQ -0.73472 -0.53225 -0.63740
## Median -0.05205 0.03693 -0.05645
## Mean -0.03918 0.02587 -0.04178
## 3rdQ 0.63445 0.69756 0.55504
## Max 2.04528 1.63277 1.99509
As a reminder, we are testing whether age affects the likelihood of giving and receiving GG rubbing. In the output above, x1 represents the first variable in the model (age_receiving), and x2 represents the second variable in the model (age_sending).
There is a significantly negative relationship between age of sender and likelihood of number of edges in the GG rubbing network. In other words, for every year older, a female bonobo has -0.6817886 fewer GG rubbing edges for initiating. They are less likely to initiate GG rubbing as they get older. However, we cannot say anything about getting older and receiving GG rubbing events because the relationship was not significant for the two tailed hypothesis. Though we can note the relationship is also negative for this. The Chi-squared goodness of fit test p-value is signficant.
A follow up question given these results could be are older bonobos overall less likely to participate in GG rubbing? We could answer this with a linear regression QAP.
Our bonobo data was a little anti-climactic. But sometimes, that’s the nature of small sample primatology work. To make things a little more exciting, let’s run the three QAPs again with some spider monkey data.
Anzà, S., Demuru, E., & Palagi, E. (2021). Sex and grooming as exchange commodities in female bonobos’ daily biological market. Scientific reports, 11(1), 19344.
David, H. A. (1987). Ranking from unbalanced paired-comparison data. Biometrika, 74(2), 432–436. https://doi.org/10.1093/biomet/74.2.432
De Vries, H., Stevens, J. M. G., & Vervaecke, H. (2006). Measuring and testing the steepness of dominance hierarchies. Animal Behaviour, 71(3), 585–592. https://doi.org/10.1016/j.anbehav.2005.05.015
Luke, Douglas A. 2015. A User’s Guide to Network Analysis in R. Springer.
Rimbach, R., Bisanzio, D., Galvis, N., Link, A., Di Fiore, A., & Gillespie, T. R. (2015). Brown spider monkeys (Ateles hybridus): A model for differentiating the role of social networks and physical contact on parasite transmission dynamics. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1669), 20140110. https://doi.org/10.1098/rstb.2014.0110
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge University Press.